fix: prevent duplicate eval run entries during suspend/resume#1176
Merged
Conversation
smflorentino
approved these changes
Jan 22, 2026
Fixes duplicate eval run entries in StudioWeb during suspend/resume cycles. ## Problem When running evaluations with suspend/resume, two separate entries were created in StudioWeb instead of updating the same entry: - First entry: Created during suspend phase with "suspended" status - Second entry: Created during resume phase with "completed" status Both entries had the same evalSetRunId and evalSnapshot.id but different entry IDs, causing confusion in the SW UI. ## Root Cause The CREATE_EVAL_RUN event was published on BOTH suspend execution AND resume execution (line 516-522). This created a new database entry each time, instead of updating the existing entry on resume. ## Solution Added a check for `self.context.resume` before publishing CREATE_EVAL_RUN. Now: - On initial execution: Creates new eval run entry (as before) - On resume: Skips creation, only updates the existing entry ## Impact - Users will see a single eval run entry with complete lifecycle: pending → suspended → completed - StudioWeb UI will show cleaner results and accurate metrics - Trace data will have a single eval run with complete history 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add comprehensive test suite to verify that: - Normal flow: CREATE_EVAL_RUN event is published - Resume flow: CREATE_EVAL_RUN event is NOT published (preventing duplicates) - UPDATE_EVAL_RUN continues to work in all scenarios - Complete suspend/resume lifecycle operates correctly Tests cover: - Successful execution with CREATE_EVAL_RUN - Suspend execution with CREATE_EVAL_RUN - Resume skipping CREATE_EVAL_RUN - Resume still publishing UPDATE_EVAL_RUN - No duplicate entries on resume - Complete suspend-then-resume lifecycle 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add type: ignore[method-assign] comments to fix mypy errors when mocking EventBus.publish method in tests. This is a common testing pattern where we replace methods with mocks. Fixes 3 mypy errors: - Line 141: event_bus fixture - Line 309: suspend phase event bus - Line 338: resume phase event bus 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
f2a00d0 to
8c7bbcf
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When running evaluations with suspend/resume, two separate entries are created in StudioWeb instead of updating the same entry:
Entry #1 (Suspend Phase):
Entry #2 (Resume Phase):
Both entries have the same
evalSetRunIdandevalSnapshot.idbut different entry IDs, causing confusion in the SW UI.Root Cause
The
CREATE_EVAL_RUNevent is published on BOTH suspend execution AND resume execution insrc/uipath/_cli/_evals/_runtime.py(lines 516-522).Execution Flow
Suspend Phase:
CREATE_EVAL_RUN→ Creates Entry Feat/basic invoke process #1interrupt()→ Returns SUSPENDEDUPDATE_EVAL_RUN→ Updates Entry Feat/basic invoke process #1 with suspend infoResume Phase:
CREATE_EVAL_RUNAGAIN → Creates Entry Refactor/folder layout #2 ❌UPDATE_EVAL_RUN→ Updates Entry Refactor/folder layout #2 with completion infoSolution
Added a check for
self.context.resumebefore publishingCREATE_EVAL_RUN:Now:
Impact
✅ Users will see a single eval run entry with complete lifecycle:
pending → suspended → completed✅ StudioWeb UI will show cleaner results and accurate metrics
✅ Trace data will have a single eval run with complete history
Testing
Tested with local suspend/resume evaluation cycles to verify:
Related Documentation
Investigation documented in:
SUSPEND_RESUME_DUPLICATE_ENTRIES_INVESTIGATION.md(backend repo)